kohya-ss lora support #295

arledesma · 2025-06-28T00:05:04Z

Brings support from kohya-ss implementations in their FramePack-LoraReady fork as well as their contributions to FramePack-eichi

https://gist.github.com/kohya-ss/fa4b7ae7119c10850ae7d70c90a59277

https://github.com/kohya-ss/FramePack-LoRAReady/blob/3613b67366b0bbf4a719c85ba9c3954e075e0e57

https://github.com/kohya-ss/FramePack-eichi/blob/4085a24baf08d6f1c25e2de06f376c3fc132a470

Features

Add kohya-ss lora LoRAReady implementation - supporting additional LoRA formats
- dropdown in settings supports both lora loaders
  - diffusers
  - lora_ready
Model (merged transfomer) reuse optimization - reducing time to generate
- checkbox in settings, defaulted to off
- model reuse can occur when the following occurs in succession:
  - exact same model
  - exact same lora(s) and associated weights
  - the second or subsequent request

UI

Settings - Experimental Settings collapsed accordion
- Dropdown for lora loader
  - diffusers
  - lora_ready
- Checkbox for model reuse
  - Unchecked by default

Additional

Add fp8 support to convert LoRA
- not wired up - will require supporting fp8 model weights, ui and plumbing

LoRA's like https://civitai.com/models/1518315/transporter-effect-from-star-trek-the-next-generation-or-hunyuan-video-lora are functioning with this implementation, while failing with the existing (diffusers) implementation.

Under the covers

LoRA keys are named with a prefix of lora_unet_.
The lora name replaces any . with a _.
The keys then have a format of lora_unet_{lora_name}.

The lora loader iterates through the lora dictionary keys to rename keys to a consistent naming.

When keys begin with diffusion_model or transfomer AND end with lora_A then it is converted with convert_from_diffusion_pipe_or_something
When keys include double_blocks or single_blocks then it is identified as hunyan and converted with convert_hunyan_to_framepack

convert_from_diffusion_pipe_or_something renames diffusion_model keys from _lora_A_ to .lora_down., and from _lora_B_ to .lora_up.
convert_from_diffusion_pipe_or_something assigns the first dimension of the lora_down to the lora weight
convert_from_diffusion_pipe_or_something adds the alpha with rank to the weight
convert_hunyan_to_framepack renames and splits keys in double_blocks and single_blocks
convert_hunyan_to_framepack splits up and slices QKV keys (or QKVM keys) into individual Q,K,V (or Q,K,V,M keys)

fp8 is out of scope - has been set to raise an exception when used to alert future development

arledesma · 2025-07-01T02:43:19Z

It looked like I had pushed up half broken code. Should now be in a working state.
Multiple LoRA's seem to behave fairly well together.

I'm not sure why the global current_generator was being referenced by importing from __main__, but it works just fine for me with a normal global reference.
In order for me to understand the values that were being passed around I also needed to add some additional typing, which could really help to clean up much of the codebase if you chose to go that way. Many of the critical paths now have typing so you could springboard from these changes to remove string references to model_type, or feel free to just take the bits that you want. There are also quite a few unused variables that made it a little difficult to troubleshoot (and a few uninitialized variables that are possibly causing bugs) that should probably be reviewed for removal or proper use.

arledesma · 2025-07-01T02:48:35Z

Work was started to enable reusing the existing transformer, which would have shaved off ~30 seconds per generation on my local if there were no changes to base model or lora weights, but it would have required the additional work that I do not currently have the time to put in.

arledesma · 2025-07-08T04:19:50Z

@colinurbs I went ahead and hacked in a manager to enable reuse of the existing transformer when there are no changes to the model or any weights. Without this additional change there is around 30-45 seconds of load time for the LoRA's while using kohya_ss's implementation.

It's pretty nice so far.

These were generated at 256x256 for testing, but the lora's seem to be performing some heavy lifting.

250707_231212_812_2865_9.mp4

250707_231155_707_4831_9.mp4

colinurbs · 2025-07-09T00:04:18Z

This is fantastic work, thank you so much. I see you've left it as a draft. Is there any reason I shouldn't merge this into develop and start testing it?

arledesma · 2025-07-09T16:12:22Z

@colinurbs no reason from my side to not merge into develop, I'll remove the draft status.
The PR is also marked to allow maintainers of this repo to directly edit it, so the team could effectively work the pr branch directly without merging if you find that it is too risky to merge.

RT-Borg · 2025-07-15T19:56:30Z

modules/pipelines/worker.py

-
+            if current_generator is not None and current_generator.transformer is not None:
+                offload_model_from_device_for_memory_preservation(
+                    current_generator.transformer, target_device=gpu, preserved_memory_gb=settings.get("gpu_memory_preservation", 8.0))


In the inherited FP demo code from Illyasviel, this was explicitly preserved at 8GB, unlike most preservations which used his Setting, which defaulted to 6GB. I don't know if this change is an issue, but it should be tested. (There's a second similar change below.)

RT-Borg · 2025-07-15T20:01:18Z

modules/pipelines/worker.py

-                offload_model_from_device_for_memory_preservation(studio_module.current_generator.transformer, target_device=gpu, preserved_memory_gb=8)
+                    current_generator.move_lora_adapters_to_device(cpu)
+                offload_model_from_device_for_memory_preservation(
+                    current_generator.transformer, target_device=gpu, preserved_memory_gb=settings.get("gpu_memory_preservation", 8.0))


In the inherited FP demo code from Illyasviel, this was explicitly preserved at 8GB, unlike most preservations which used his Setting, which defaulted to 6GB. I don't know if this change is an issue, but it should be tested. (There's a second similar change above.)

RT-Borg · 2025-07-15T20:04:45Z

modules/pipelines/worker.py

+
+            studio_manager.current_generator = current_generator = new_generator
+            # Load the transformer model
+            current_generator.load_model()


It's not clear why we load here without moving the transformer to the gpu, but on 296 a pre-existing transformer does go to the gpu.

RT-Borg · 2025-07-15T20:06:09Z

modules/pipelines/worker.py

+            f"Worker: AFTER model assignment, current_generator is {type(current_generator)}, id: {id(current_generator)}")
+        if current_generator:
+            print(
+                f"Worker: current_generator.transformer is {type(current_generator.transformer)}. load_model() will be called next.")


The model is already loaded above.

I think this was just some print debugging. I'll remove the noise.

Xipomus · 2025-07-17T19:15:11Z

Keeps download model files even when it already found them...:
Loading diffusion_pytorch_model-00001-of-00003.safetensors:odel doesn't have any LoRA adapters or peft_config.
[After unloading LoRAs] Transformer has no peft_config attribute
[After unloading LoRAs] No LoRA components found in transformer
Loading LoRAs: ['D4nceCLub_e80_1561500', 'cyberpunk_1_epoch_1359380'] with values: [1, 1]
Previous LoRA config: None, Current LoRA config: ModelConfiguration(model_name='Original', settings=ModelSettings(lora_settings=[ModelLoraSetting(name='D4nceCLub_e80_1561500', weight=1.0, sequence=0, exclude_blocks=None, include_blocks=None), ModelLoraSetting(name='cyberpunk_1_epoch_1359380', weight=1.0, sequence=1, exclude_blocks=None, include_blocks=None)]))
Previous LoRA hash: , Current LoRA hash: f7eddeb183957386a46900faf001c42c
Loading LoRAs using kohya_ss loader from E:\Stable Diffusion-AI images\webui\models\Lora\Hunyuan
LoRA -> Found model files: ['E:\FramePack\framepack_cu126_torch26\webui\hf_download\hub\models--lllyasviel--FramePackI2V_HY\snapshots\86cef4396041b6002c957852daac4c91aaa47c79\diffusion_pytorch_model-00001-of-00003.safetensors', 'E:\FramePack\framepack_cu126_torch26\webui\hf_download\hub\models--lllyasviel--FramePackI2V_HY\snapshots\86cef4396041b6002c957852daac4c91aaa47c79\diffusion_pytorch_model-00002-of-00003.safetensors', 'E:\FramePack\framepack_cu126_torch26\webui\hf_download\hub\models--lllyasviel--FramePackI2V_HY\snapshots\86cef4396041b6002c957852daac4c91aaa47c79\diffusion_pytorch_model-00003-of-00003.safetensors']
LoRA loading: D4nceCLub_e80_1561500.safetensors (scale: 1)
LoRA loading: cyberpunk_1_epoch_1359380.safetensors (scale: 1)
Model architecture: HunyuanVideo
Diffusion-pipe (?) LoRA detected
HunyuanVideo LoRA detected, converting to FramePack format
Diffusion-pipe (?) LoRA detected
HunyuanVideo LoRA detected, converting to FramePack format
Merging LoRA weights into state dict. multiplier: [1, 1]
Loading diffusion_pytorch_model-00002-of-00003.safetensors: 59%|███████████▊ | 284/482 [00:30<00:18, 10.53it/s]

colinurbs · 2025-07-18T00:51:45Z

The first commit of this has been merged into develop and seems to be working well so far.

arledesma · 2025-07-18T02:43:43Z

Keeps download model files even when it already found them...:

Loading diffusion_pytorch_model-00001-of-00003.safetensors: ...
...
Loading diffusion_pytorch_model-00002-of-00003.safetensors: 59%|███████████▊ | 284/482 [00:30<00:18, 10.53it/s]

This is loading the model from disk, not downloading. The progress bar is a function of tqdm.

arledesma · 2025-07-18T04:40:50Z

@colinurbs you were just a bit too quick for me :D

I was working on minimizing the changes and exposing the settings in https://github.com/arledesma/temp-FramePack-Studio/commits/feature/kohya-ss-loraready which is from a couple of hours ago, around the time that you merged the commit to develop.

I have https://github.com/arledesma/temp-FramePack-Studio/commits/feature/kohya-ss-loraready-develop/ merged with the current develop.

Providing the choice for the loader and model reuse in the settings page seemed like an easier A/B test.

Regardless, I'm good with anything that you want to do over here. 👍🏽

arledesma · 2025-07-18T05:45:46Z

https://github.com/arledesma/temp-FramePack-Studio/commits/feature/kohya-ss-loraready-develop/ is currently unloading the loras when reusing the model transformer, so only the first generation will apply lora weights.

Definitely not in a desirable state.

colinurbs · 2025-07-18T15:26:37Z

@arledesma thanks for this. I'm going to roll back develop and merge this whole thing in this evening. I haven't had time to dig into the code, did you add an exception for already downloaded models to prevent it from forcing existing users to re-download?

arledesma · 2025-07-18T19:12:55Z

@colinurbs I have yet to replicate any models being re-downloaded. I've attempted multiple times with different settings and even reinstalled the entire repository without experiencing the issue.

Could it just be what is mentioned in #295 (comment) where it is loading from disk being misinterpreted as downloading? If so then maybe we can maybe update the wording in lora_utils.load_safetensors_with_fp8_optimization() with something other than f"Loading {os.path.basename(model_file)}"?

def load_safetensors_with_fp8_optimization(
    model_files: list[str],
    fp8_optimization: bool,
    device: torch.device,
    weight_hook: Callable | None = None,
) -> dict[str, torch.Tensor]:
    """
    Load state dict from safetensors files and merge LoRA weights into the state dict with fp8 optimization if needed.
    """
    state_dict = {}
    if fp8_optimization:
        raise RuntimeWarning("FP8 optimization is not yet supported in this version.")
        from .fp8_optimization_utils import (
            optimize_state_dict_with_fp8_on_the_fly,
        )

        # Optimization targets and exclusion keys
        TARGET_KEYS = ["transformer_blocks", "single_transformer_blocks"]
        EXCLUDE_KEYS = [
            "norm"
        ]  # Exclude norm layers (e.g., LayerNorm, RMSNorm) from FP8

        print(f"FP8: Optimizing state dictionary on the fly")
        # Optimized state dictionary in FP8 format
        state_dict = optimize_state_dict_with_fp8_on_the_fly(
            model_files,
            device,
            TARGET_KEYS,
            EXCLUDE_KEYS,
            move_to_device=False,
            weight_hook=weight_hook,
        )
    else:
        from .safetensors_utils import MemoryEfficientSafeOpen

        state_dict = {}
        for model_file in model_files:
            with MemoryEfficientSafeOpen(model_file) as f:
                for key in tqdm(
                    f.keys(),
                    desc=f"Loading {os.path.basename(model_file)}",
                    leave=False,
                ):
                    value = f.get_tensor(key)
                    if weight_hook is not None:
                        value = weight_hook(key, value)
                    state_dict[key] = value

    return state_dict

colinurbs · 2025-07-18T19:23:03Z

Ok, I'll try this again tonight and see if it happens for me. I believe it's due to the slightly different structure being used in the hf cache folder. But I'll let you know.

RT-Borg · 2025-07-18T21:16:03Z

@colinurbs I have yet to replicate any models being re-downloaded. I've attempted multiple times with different settings and even reinstalled the entire repository without experiencing the issue.

@arledesma see my long message on discord #testers last night (that I @'d you on) for a detailed description of why this change in studio.py caused me re-downloads: https://github.com/FP-Studio/framepack-studio/pull/295/files#diff-05934289eba73cfacb716666819e900c0a2212ad0c7952f5cbb014617b3b739bR26

arledesma · 2025-07-18T23:00:48Z

@RT-Borg

@arledesma see my long message on discord #testers last night (that I @'d you on) for a detailed description of why this change in studio.py caused me re-downloads: https://github.com/FP-Studio/framepack-studio/pull/295/files#diff-05934289eba73cfacb716666819e900c0a2212ad0c7952f5cbb014617b3b739bR26

Ahh, I see. It's almost odd that users with an explicitly set HF_HOME were not demanding that to be set and patching studio.py with each change. I had been manually adding it each time that I pulled and it just got lost in the original PR (as I didn't have the time to finish the work and pushed everything in my local).

Maybe, in the future, we can just add a cli flag to just skip setting the environment variable and let the sdk define how it behaves.

Brings support from kohya-ss implementations in their FramePack-LoraReady fork as well as their contributions to FramePack-eichi (that do not seem to be correctly attributed to kohya-ss in thei primary eichi repo) https://gist.github.com/kohya-ss/fa4b7ae7119c10850ae7d70c90a59277 https://github.com/kohya-ss/FramePack-LoRAReady/blob/3613b67366b0bbf4a719c85ba9c3954e075e0e57 https://github.com/kohya-ss/FramePack-eichi/blob/4085a24baf08d6f1c25e2de06f376c3fc132a470

We do not manage multiple VideoJobQueue's, so this singleton can be imported and used anywhere that we need access to the queue

Enable switching between known lora loaders Includes StrEnum implementation for python 3.10 (or older 3.x) users, otherwise use the builtin StrEnum from python >= 3.11

We do not manage multiple Settings objects, so this singleton can be imported and used anywhere that we need access to the Settings

These are defaulted to continue existing behavior. diffusers lora loader and no reuse of model instance

…ransformer3DModel This was existing behavior that was found to not be mapped.

…x_app

This was leading to the model being downloaded again for users that did have HF_HOME set to a value. We will need to document the migration path for existing users to avoid redownloading the entire models again.

RT-Borg · 2025-07-19T00:16:42Z

Ahh, I see. It's almost odd that users with an explicitly set HF_HOME were not demanding that to be set and patching studio.py with each change.

Pinokio users (like me) have it set for them (via the Pinokio ENVIRONMENT) without even realizing it. It's possible some others have set it for other AI projects and wouldn't even realize we use it. I know some users who have multiple installs have mentioned using junction links to avoid multiple model copies on the discord.

But I think it just hasn't been on that many people's radars.

Maybe, in the future, we can just add a cli flag to just skip setting the environment variable and let the sdk define how it behaves.

That sounds reasonable, but it's a little outside my lane to actually weigh in unless I learn more about the options and conventions. Whatever we do, we just need to take care that legacy users don't download another 80GB (and maybe not even know if or where there are duplicates to delete).

When loaded from a queue import the selected_loras and lora_values are equal in length. When loaded from the Generate interface the lora_values are the same length as the lora_loaded_names and must be reduced. This change now makes the assumption that when the two lists are the same length that they are in the correct order.

diffusers uses this environment variable to automatically downloads files on import. weird side effect to do that amount of actual work on import.

Xipomus · 2025-07-19T18:50:53Z

So to sum op testing for now:

import queue fixed
+no more crashes on 2nd render
not seeing the model load all the time
+better communication when the job starts, so you know what is happening

In settings needs Experimental settings, Lora loader on lora_ready (might want to make this default)
Else framepack lora's will have issues loading and some hunyuansuan-video-lora.

selecting a lora after one video rendered sets weight of lora to 0
and sometimes gradio doens't jump to the next render.. but could be a gradio thing...

arledesma · 2025-07-20T17:57:23Z

Evaluating if we should default to the new loader.

Pros:

no user interaction required
silently support additional lora formats by default
automatically provide better support for mixing loras in some cases

Cons:

Importing or performing the same generation, with the same values, may not produce the repeatable outcomes from the initial generation
Unknown bugs may affect a larger user base
Changing default behavior(s) may negatively impact expected user experiences
- when mixing loras the single and dual blocks may conflict leading to extreme distortions (this may not be noticed in diffusers as the conflicting weights may not even be loaded)

I'm sure that there are additional pros and cons.

arledesma force-pushed the feature/kohya-ss-lora-support branch 4 times, most recently from 3570f32 to a8cd34e Compare July 1, 2025 02:40

arledesma mentioned this pull request Jul 1, 2025

[Lora Support] #310

Closed

arledesma force-pushed the feature/kohya-ss-lora-support branch 3 times, most recently from 4b4c2e1 to b130f16 Compare July 8, 2025 04:02

arledesma marked this pull request as ready for review July 9, 2025 16:12

arledesma changed the title ~~kohya ss lora support~~ kohya-ss lora support Jul 9, 2025

arledesma force-pushed the feature/kohya-ss-lora-support branch from 590cb22 to 1e6a32c Compare July 9, 2025 17:42

RT-Borg reviewed Jul 15, 2025

View reviewed changes

arledesma added 15 commits July 18, 2025 18:17

VideoJobQueue singleton

ff6c542

We do not manage multiple VideoJobQueue's, so this singleton can be imported and used anywhere that we need access to the queue

Add LoraLoader enum

75eea90

Enable switching between known lora loaders Includes StrEnum implementation for python 3.10 (or older 3.x) users, otherwise use the builtin StrEnum from python >= 3.11

Settings singleton

f5cca71

We do not manage multiple Settings objects, so this singleton can be imported and used anywhere that we need access to the Settings

Add Settings - lora_loader, reuse_model_instance

5d53e3c

These are defaulted to continue existing behavior. diffusers lora loader and no reuse of model instance

Expose lora_loader and reuse_model_instance in Settings UI

635e928

Map DynamicSwap_HunyuanVideoTransformer3DModelPacked to HunyuanVideoT…

6d1dd71

…ransformer3DModel This was existing behavior that was found to not be mapped.

Add ModelConfiguration to track of model settings and customizations

5b57b02

Wire up kohya_ss LoRAReady loader in BaseModelGenerator

17ac030

Return type from create_model_generator() in generators module

d320f0f

Use StudioManager in worker with model reuse settings

797973f

Reduce complexity of studio with Studio Manager

74d6a6a

Add StudioManager

fe122d5

Replace dynamic studio_module attributes with StudioManager in toolbo…

2041ed3

…x_app

Revert to overwriting HF_HOME environment

6c0d41f

This was leading to the model being downloaded again for users that did have HF_HOME set to a value. We will need to document the migration path for existing users to avoid redownloading the entire models again.

arledesma added 2 commits July 18, 2025 21:15

Set HF_HOME environment before importing from diffusers

a8cba73

diffusers uses this environment variable to automatically downloads files on import. weird side effect to do that amount of actual work on import.

arledesma force-pushed the feature/kohya-ss-lora-support branch from b044cb2 to a8cba73 Compare July 19, 2025 02:23

colinurbs merged commit 16ddd3d into FP-Studio:develop Jul 22, 2025

arledesma deleted the feature/kohya-ss-lora-support branch July 22, 2025 22:34

kohya-ss lora support #295

kohya-ss lora support #295

Uh oh!

Conversation

arledesma commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Features

UI

Additional

Uh oh!

arledesma commented Jul 1, 2025

Uh oh!

arledesma commented Jul 1, 2025

Uh oh!

arledesma commented Jul 8, 2025

Uh oh!

colinurbs commented Jul 9, 2025

Uh oh!

arledesma commented Jul 9, 2025

Uh oh!

RT-Borg Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

RT-Borg Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

RT-Borg Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

RT-Borg Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

arledesma Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Xipomus commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colinurbs commented Jul 18, 2025

Uh oh!

arledesma commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arledesma commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arledesma commented Jul 18, 2025

Uh oh!

colinurbs commented Jul 18, 2025

Uh oh!

arledesma commented Jul 18, 2025

Uh oh!

colinurbs commented Jul 18, 2025

Uh oh!

RT-Borg commented Jul 18, 2025

Uh oh!

arledesma commented Jul 18, 2025

Uh oh!

RT-Borg commented Jul 19, 2025

Uh oh!

Xipomus commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arledesma commented Jul 20, 2025

Uh oh!

Uh oh!

arledesma commented Jun 28, 2025 •

edited

Loading

Xipomus commented Jul 17, 2025 •

edited

Loading

arledesma commented Jul 18, 2025 •

edited

Loading

arledesma commented Jul 18, 2025 •

edited

Loading

Xipomus commented Jul 19, 2025 •

edited

Loading